Exploring lateral genetic transfer among microbial genomes using TF-IDF
نویسندگان
چکیده
Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.
منابع مشابه
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to co...
متن کاملA novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer...
متن کاملGenetic transfer in Staphylococcus: a case study of 13 genomes
The widespread presence of antibiotic resistance and virulence among Staphylococcus isolates has been attributed to lateral genetic transfer (LGT) between different strains or species. However, there has been very little study of the extent of LGT in Staphylococcus species using a phylogenetic approach, particularly of the units of such genetic transfer. Here we report the first systematic stud...
متن کاملLateral transfer of genes and gene fragments in Staphylococcus extends beyond mobile elements.
The widespread presence of antibiotic resistance and virulence among Staphylococcus isolates has been attributed in part to lateral genetic transfer (LGT), but little is known about the broader extent of LGT within this genus. Here we report the first systematic study of the modularity of genetic transfer among 13 Staphylococcus genomes covering four distinct named species. Using a topology-bas...
متن کاملDistinguishing Microbial Genome Fragments Based on Their Composition: Evolutionary and Comparative Genomic Perspectives
It is well known that patterns of nucleotide composition vary within and among genomes, although the reasons why these variations exist are not completely understood. Between-genome compositional variation has been exploited to assign environmental shotgun sequences to their most likely originating genomes, whereas within-genome variation has been used to identify recently acquired genetic mate...
متن کامل